Skip to content

Conversation

@sjoerdmeijer
Copy link
Collaborator

This sets the cache line size to 64 for the Neoverse V2 and V3. I've tested this with loop-interchange: it doesn't result in extra compile-times, but it does enable a lot more interchange.

I've also set this for V3, have not tested this, but looks like the sensible thing to do, but am happy to remove this.

This sets the cache line size to 64 for the Neoverse V2 and V3.  I've
tested this with loop-interchange: it doesn't result in extra
compile-times, but it does enable a lot more interchange.

I've set this to V3, have not tested this, but looks like the sensible
thing to do, but am happy to remove this.
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

This sets the cache line size to 64 for the Neoverse V2 and V3. I've tested this with loop-interchange: it doesn't result in extra compile-times, but it does enable a lot more interchange.

I've also set this for V3, have not tested this, but looks like the sensible thing to do, but am happy to remove this.


Full diff: https://github.com/llvm/llvm-project/pull/148213.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+1)
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 68ed10570a52f..4ac93526295aa 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -270,6 +270,7 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
     break;
   case NeoverseV2:
   case NeoverseV3:
+    CacheLineSize = 64;
     EpilogueVectorizationMinVF = 8;
     MaxInterleaveFactor = 4;
     ScatterOverhead = 13;

@kasuga-fj
Copy link
Contributor

it doesn't result in extra compile-times, but it does enable a lot more interchange.

Exactly, and that ended up exposing another correctness issue...
Sorry, I hadn’t really looked at the details of applied interchanges, so I totally missed it. Just noticed it a moment ago.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't result in extra compile-times, but it does enable a lot more interchange.

Exactly, and that ended up exposing another correctness issue... Sorry, I hadn’t really looked at the details of applied interchanges, so I totally missed it. Just noticed it a moment ago.

For checking correctness of LoopInterchange, it would probably be good to test with interchanging all loops that are considered legal, ignoring the cost model.

@kasuga-fj
Copy link
Contributor

it doesn't result in extra compile-times, but it does enable a lot more interchange.

Exactly, and that ended up exposing another correctness issue... Sorry, I hadn’t really looked at the details of applied interchanges, so I totally missed it. Just noticed it a moment ago.

For checking correctness of LoopInterchange, it would probably be good to test with interchanging all loops that are considered legal, ignoring the cost model.

It makes perfect sense to me.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this change, I think it's fine as long as it follows the specifications of Neoverse V2 and V3 (though I'm not familiar with them). However, I think it would be better to merge this after correctness issues of LoopInterchange are resolved to avoid some troubles.

@sjoerdmeijer
Copy link
Collaborator Author

@kasuga-fj : thanks for the report, shall I look at that miscompilation since you're probably busy with the delinearization?

@fhahn : yeah that makes sense. I am going to do some more testing. I might put up a little patch introducing an internal option to ignore the cost-model, I can see how this can be useful for testing and experimentation.

@kasuga-fj
Copy link
Contributor

@kasuga-fj : thanks for the report, shall I look at that miscompilation since you're probably busy with the delinearization?

I've probably got a fix in mind, so I can take care of it next week if that's okay. But if it's urgent, feel free to take it over.

@sjoerdmeijer
Copy link
Collaborator Author

@kasuga-fj : thanks for the report, shall I look at that miscompilation since you're probably busy with the delinearization?

I've probably got a fix in mind, so I can take care of it next week if that's okay. But if it's urgent, feel free to take it over.

Sure, go ahead if you already have a way of fixing this.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 sounds good if this does something useful nowadays. It could probably be the default for all CPUs.

@sjoerdmeijer
Copy link
Collaborator Author

FYI: I have kicked off different test jobs and am running different fuzzers over the weekend. I am testing this with interchange enabled, the cache line set to 64, and have disable the cost model, to test interchange as much as possible as suggested. I have already found one unrelated issue, #133922, but will report back next week on the interchange results.

@sjoerdmeijer
Copy link
Collaborator Author

64 sounds good if this does something useful nowadays. It could probably be the default for all CPUs.

I think so too. The default of 0 doesn't make an awful lot of sense.

@sjoerdmeijer
Copy link
Collaborator Author

I will merge this a bit later today. I think this is mostly orthogonal to interchange. And while interchange is off by default, this is a good setting to get more test coverage.

@sjoerdmeijer sjoerdmeijer merged commit d26106d into llvm:main Jul 15, 2025
11 checks passed
@sjoerdmeijer sjoerdmeijer deleted the cachelinesize64 branch July 15, 2025 13:59
tru pushed a commit that referenced this pull request Jul 16, 2025
This sets the cache line size to 64 for the Neoverse V2 and V3. I've
tested this with loop-interchange: it doesn't result in extra
compile-times, but it does enable a lot more interchange.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants